feat: add Codex parity, discuss-phase scouting, and agent quality guards by Tibsfox · Pull Request #811 · gsd-build/get-shit-done

Tibsfox · 2026-02-28T10:42:18Z

Summary

Three feature enhancements that improve Codex runtime parity, discuss-phase intelligence, and agent execution quality:

Codex multi-agent config: Full request_user_input mapping in skill adapter, config.toml generation with per-agent .toml files, agent role headers, sandbox mode assignment, and clean uninstall support
Code-aware discuss phase: Codebase scouting step before gray area identification, code-context annotations in options, <code_context> section in CONTEXT.md template
Agent quality guards: Analysis paralysis guard (gsd-executor), exhaustive PROJECT.md cross-check (gsd-plan-checker), task-level TDD with <behavior> blocks (gsd-planner)

Motivation

Codex parity gap: The Codex runtime adapter had a minimal skill header with no AskUserQuestion mapping and no multi-agent configuration. GSD workflows that rely on interactive questioning (discuss-phase) or agent spawning (execute-phase) could not function on Codex. Agent .md files were converted with basic markdown transforms but lacked the <codex_agent_role> header and per-agent .toml sandbox configs that Codex requires for proper isolation.

Blind discuss-phase: The discuss-phase workflow generated gray areas purely from the ROADMAP.md phase description without examining the actual codebase. This meant it could not suggest reusing existing components, highlight established patterns, or annotate options with code context -- leading to decisions that ignored what was already built.

Agent execution drift: Three recurring failure modes in production:

Executors entering read-loops (5+ consecutive Read/Grep/Glob calls) without writing code, consuming context budget on analysis instead of implementation
Plan-checker verifying only the phase goal's requirements while silently dropping broader PROJECT.md requirements relevant to the phase
Code-producing tasks in standard plans lacking test expectations, causing executors to write implementation-first code without behavioral contracts

Changes

Commit 1: `bf26f95` -- Codex request_user_input, multi-agent config, agent role generation

bin/install.js (+298 lines):

CODEX_AGENT_SANDBOX map: 11 agents with sandbox modes (9 workspace-write, 2 read-only)
getCodexSkillAdapterHeader(): Expanded from 6-line stub to three structured sections:
- Section A: Skill invocation syntax ($skillName, {{GSD_ARGS}})
- Section B: AskUserQuestion to request_user_input parameter mapping (header, question, options, multiSelect workaround, Execute mode fallback)
- Section C: Task() to spawn_agent mapping (agent_type, fork_context, parallel wait pattern, result markers, close_agent)
convertClaudeAgentToCodexAgent(): Adds <codex_agent_role> header with role/tools/purpose, cleans frontmatter (drops color/tools fields, quotes name/description)
generateCodexAgentToml(): Per-agent .toml with sandbox_mode and developer_instructions from agent body
generateCodexConfigBlock(): Generates [features] (multi_agent, default_mode_request_user_input) and [agents] (max_threads=4, max_depth=2) with per-agent sections referencing .toml config files
stripGsdFromCodexConfig(): Clean removal of GSD sections during uninstall (handles marker-based, injected keys, and [agents.gsd-*] sections)
mergeCodexConfig(): Three-case merge (new file, existing with marker, existing without marker with feature injection)
installCodexConfig(): Orchestrates agent discovery, .toml generation, and config merge
Test-mode export gate (GSD_TEST_MODE env var) for module-level testing without CLI side effects
Uninstall path: removes agent .toml files and cleans config.toml

tests/codex-config.test.cjs (+412 lines, 30 tests across 8 suites):

getCodexSkillAdapterHeader: Section presence, invocation syntax, parameter mapping, spawn_agent mapping
convertClaudeAgentToCodexAgent: Frontmatter cleanup, slash command conversion, no-frontmatter passthrough
generateCodexAgentToml: Sandbox mode assignment (workspace-write, read-only, default), developer_instructions embedding
CODEX_AGENT_SANDBOX: Agent count (11), write/read-only classification
generateCodexConfigBlock: Marker, feature flags, agent limits, per-agent sections
stripGsdFromCodexConfig: GSD-only removal, user content preservation, injected key stripping, empty section cleanup, [agents.gsd-*] removal
mergeCodexConfig: Three cases + idempotency + existing [features] injection
installCodexConfig (integration): End-to-end with real agent files

Commit 2: `0dc8120` -- Code-aware discuss phase with codebase scouting (#727)

commands/gsd/discuss-phase.md:

Added Glob and Grep to allowed-tools (needed for codebase scouting)
Updated process steps: scout codebase before analysis, code-informed gray areas, code-context in CONTEXT.md
Added Task and mcp__context7__* to allowed-tools for auto-advance and library documentation lookup

get-shit-done/workflows/discuss-phase.md (+77 lines):

New <step name="scout_codebase"> between check_existing and analyze_phase:
- Checks .planning/codebase/*.md maps first (CONVENTIONS, STRUCTURE, STACK)
- Falls back to targeted grep using phase goal terms
- Builds internal <codebase_context> (reusable assets, established patterns, integration points, creative options)
Updated analyze_phase to use codebase_context for grounded analysis
Updated present_gray_areas with code context annotation examples
Updated discuss_areas with code-context-annotated option examples and Context7 library lookup
Updated write_context to include <code_context> section (reusable assets, established patterns, integration points)

get-shit-done/templates/context.md (+14 lines):

Added <code_context> section template with Reusable Assets, Established Patterns, and Integration Points subsections
Updated good examples to show code context usage

Commit 3: `9124906` -- Analysis paralysis guard, exhaustive cross-check, task-level TDD (#736)

agents/gsd-executor.md (+10 lines):

New <analysis_paralysis_guard> section: After 5+ consecutive Read/Grep/Glob calls without Edit/Write/Bash, executor must stop and either write code or report "blocked" with the specific missing information

agents/gsd-plan-checker.md (+2 lines):

Exhaustive cross-check in Step 4 (Requirement Coverage): Also read PROJECT.md requirements, not just the phase goal. Any unmapped PROJECT.md requirement relevant to the phase is an automatic blocker

agents/gsd-planner.md (+20 lines):

Task-level TDD guidance: When a task creates/modifies production code, add tdd="true" and <behavior> block with explicit test expectations
XML example showing the full task structure with <behavior> element
Exception list: checkpoint tasks, config-only, docs, migrations, glue code, styling-only

Relationship to Other PRs

This is PR #4 of 6 from the dev-bugfix branch:

PR	Title	Status	Dependency
#2	Milestone completion bugs	Merged	None
#3	Cross-platform Windows CI fixes	Open	None
#4 (this)	Codex parity, discuss scouting, agent guards	Open	None
#5	Agent frontmatter + heredoc fix	Open	Merge #4 first
#6	CLI/config bug fixes	Open	None
#1	MCP migration helper	Open	Targets dev-bugfix

PR #5 dependency: PR #5 adds agent frontmatter parsing improvements that build on the agent definitions modified here (gsd-executor, gsd-planner, gsd-plan-checker). No code conflicts, but they touch overlapping agent files. Recommend merging #4 before #5.

Testing

New Tests (codex-config.test.cjs)

Suite	Tests	Focus
`getCodexSkillAdapterHeader`	4	Section presence, invocation syntax, AskUserQuestion mapping, Task mapping
`convertClaudeAgentToCodexAgent`	3	Frontmatter cleanup, slash command conversion, no-frontmatter passthrough
`generateCodexAgentToml`	4	Sandbox modes (workspace-write, read-only, default), developer_instructions
`CODEX_AGENT_SANDBOX`	3	Agent count (11), write classification, read-only classification
`generateCodexConfigBlock`	4	Marker, feature flags, agent limits, per-agent sections
`stripGsdFromCodexConfig`	5	GSD-only, user preservation, injected keys, empty sections, agent sections
`mergeCodexConfig`	5	Create new, replace existing, append without marker, inject features, idempotency
`installCodexConfig` (integration)	1	End-to-end with real agent `.md` files
Total new	30

Full Suite Results

# tests 449
# suites 87
# pass 449
# fail 0
# cancelled 0
# skipped 0
# duration_ms 7032

All 449 tests pass (30 new + 419 existing). Zero failures, zero skipped.

Impact

Codex users: Can now run the full GSD workflow on Codex -- discuss-phase questioning works via request_user_input, agent spawning works via spawn_agent, and each agent gets proper sandbox isolation through generated .toml configs
All runtimes: discuss-phase now scouts the codebase before asking questions, producing more relevant gray areas and code-aware options. CONTEXT.md captures reusable assets and integration points for downstream agents
Agent quality: Executor paralysis guard prevents context budget waste on read-loops. Plan-checker exhaustive cross-check catches silently dropped requirements. Task-level TDD ensures behavioral contracts exist before implementation
No breaking changes: All changes are additive. Existing workflows, agent behavior, and test infrastructure are unaffected
Files changed: 8 files, +845/-25 lines

…agent role generation Expand Codex adapter with AskUserQuestion → request_user_input parameter mapping (including multiSelect workaround and Execute mode fallback) and Task() → spawn_agent mapping (parallel fan-out, result parsing). Add convertClaudeAgentToCodexAgent() that generates <codex_agent_role> headers with role/tools/purpose and cleans agent frontmatter. Generate config.toml with [features] (multi_agent, request_user_input) and [agents.gsd-*] role sections pointing to per-agent .toml configs with sandbox_mode (workspace-write/read-only) and developer_instructions. Config merge handles 3 cases: new file, existing with GSD marker (truncate + re-append), existing without marker (inject features + append agents). Uninstall strips all GSD content including injected feature keys while preserving user settings. Closes gsd-build#779 Co-Authored-By: Claude Opus 4.6 <[email protected]>

Add lightweight codebase scanning before gray area identification: - New scout_codebase step checks for existing maps or does targeted grep - Gray areas annotated with code context (existing components, patterns) - Discussion options informed by what already exists in the codebase - Context7 integration for library-specific questions - CONTEXT.md template includes code_context section Co-authored-by: Claude Opus 4.6 <[email protected]>

…nd task-level TDD (gsd-build#736) - gsd-executor: Add <analysis_paralysis_guard> block after deviation_rules. If executor makes 5+ consecutive Read/Grep/Glob calls without any Edit/Write/Bash action, it must stop and either write or report blocked. Prevents infinite analysis loops that stall execution. - gsd-plan-checker: Add exhaustive cross-check in Step 4 requirement coverage. Checker now also reads PROJECT.md requirements (not just phase goal) to verify no relevant requirement is silently dropped. Unmapped requirements become automatic blockers listed explicitly in issues. - gsd-planner: Add task-level TDD guidance alongside existing TDD Detection. For code-producing tasks in standard plans, tdd="true" + <behavior> block makes test expectations explicit before implementation. Complements the existing dedicated TDD plan approach — both can coexist. Co-authored-by: CyPack <GITHUB_EMAIL_ADRESIN> Co-authored-by: Claude Sonnet 4.6 <[email protected]>

Tibsfox · 2026-03-03T08:54:22Z

Code Logic Verification of PR #4

PR: feat: add Codex parity, discuss-phase scouting, and agent quality guards
Branch: feat/agent-discuss-codex-enhancements → main
Commits: bf26f95 (Codex parity, +710 lines) + 0dc8120 (discuss scouting, +104 lines) + 9124906 (agent guards, +32 lines)
Status: All 3 commits already cherry-picked to main (1455931, 37582f8, aaea14e)

1. The Before State (Bug Paths)

Codex gap: The Codex runtime adapter had a minimal 6-line skill header with no AskUserQuestion mapping and no multi-agent configuration. GSD workflows that rely on interactive questioning (discuss-phase) or agent spawning (execute-phase) could not function on Codex. Agent .md files were converted with basic markdown transforms but lacked the <codex_agent_role> header and per-agent .toml sandbox configs.

Blind discuss-phase: The discuss-phase generated gray areas purely from the ROADMAP.md phase description without examining the actual codebase. It could not suggest reusing existing components, highlight established patterns, or annotate options with code context — leading to decisions that ignored what was already built.

Agent execution drift: Three recurring failure modes:

Executors entering read-loops (5+ consecutive Read/Grep/Glob calls) without writing code, consuming context budget on analysis
Plan-checker verifying only the phase goal while silently dropping broader PROJECT.md requirements
Code-producing tasks lacking test expectations, causing implementation-first code without behavioral contracts

2. The After State (Fix Paths)

Commit 1 — bin/install.js (+298 lines) + tests/codex-config.test.cjs (+412 lines)

Full Codex adapter expansion with 8 new functions/constants:

CODEX_AGENT_SANDBOX: Map of 11 agents → sandbox modes (9 workspace-write, 2 read-only). Unknown agents default to read-only (safe default).
getCodexSkillAdapterHeader(): Three-section block — (A) skill invocation syntax, (B) AskUserQuestion → request_user_input parameter mapping with multiSelect workaround and Execute mode fallback, (C) Task() → spawn_agent mapping with parallel fan-out and result markers.
convertClaudeAgentToCodexAgent(): Adds <codex_agent_role> header (role/tools/purpose), cleans frontmatter (drops color/tools). Handles missing frontmatter gracefully.
generateCodexAgentToml(): Per-agent .toml with sandbox_mode and developer_instructions.
generateCodexConfigBlock(): [features] (multi_agent, request_user_input) + [agents] (max_threads=4, max_depth=2) + per-agent sections.
stripGsdFromCodexConfig(): Marker-based and non-marker-based cleanup paths. Preserves user content, removes GSD content, returns null if nothing remains.
mergeCodexConfig(): Three-case merge (new file, existing+marker, existing without marker with feature injection).
installCodexConfig(): Orchestrates agent discovery → .toml generation → config merge.

30 tests across 8 suites covering all functions, edge cases (no frontmatter, unknown agents, idempotency, GSD-only cleanup), and one integration test with real agent files.

Commit 2 — discuss-phase (+77 lines workflow, +14 lines template, +13 lines command)

New scout_codebase step inserted between check_existing and analyze_phase:

Checks .planning/codebase/*.md maps first (CONVENTIONS, STRUCTURE, STACK)
Falls back to targeted grep using phase goal terms
Builds internal <codebase_context> (reusable assets, established patterns, integration points, creative options)

analyze_phase now uses both prior_decisions and codebase_context. present_gray_areas includes code context annotations. discuss_areas integrates Context7 for library-specific questions (scoped: "only when library-specific knowledge improves the options"). context.md template gains <code_context> section with Reusable Assets, Established Patterns, Integration Points subsections.

commands/gsd/discuss-phase.md adds Glob, Grep, Task, and mcp__context7__* to allowed-tools.

Commit 3 — 3 agent files (+32 lines)

gsd-executor.md: <analysis_paralysis_guard> — after 5+ consecutive Read/Grep/Glob without Edit/Write/Bash, executor must stop and either write code or report "blocked" with the specific missing information. Placed after deviation_rules, before authentication_gates.
gsd-plan-checker.md: Exhaustive cross-check in Step 4 — also reads PROJECT.md requirements, not just phase goal. Unmapped PROJECT.md requirements relevant to the phase are automatic blockers.
gsd-planner.md: Task-level TDD — tdd="true" attribute + <behavior> block for code-producing tasks in standard plans. Exception list: checkpoint tasks, config-only, docs, migrations, glue code, styling-only. Complements (doesn't conflict with) existing dedicated TDD Detection.

3. Key Correctness Checks

#	Check	Result	Notes
Commit 1 — Codex Parity
1	All 8 functions exist	✅ PASS	Verified in install.js: CODEX_AGENT_SANDBOX, getCodexSkillAdapterHeader, convertClaudeAgentToCodexAgent, generateCodexAgentToml, generateCodexConfigBlock, stripGsdFromCodexConfig, mergeCodexConfig, installCodexConfig
2	CODEX_AGENT_SANDBOX has 11 agents	✅ PASS	9 workspace-write + 2 read-only (plan-checker, integration-checker). Matches actual agent tool declarations
3	Sandbox defaults	✅ PASS	Unknown agents → `read-only` (safe default, line 545)
4	Three-case config merge	✅ PASS	New file / existing+marker / existing without marker. Idempotent (tested)
5	Clean uninstall	✅ PASS	stripGsdFromCodexConfig handles both marker-based and injected-key paths, preserves user content, cleans empty sections
6	Test-mode isolation	✅ PASS	`GSD_TEST_MODE` env var gates exports vs CLI execution
7	Test coverage	✅ PASS	30 tests / 8 suites. All functions covered. Edge cases: no frontmatter, unknown agents, idempotency, GSD-only cleanup, feature injection
8	Tests pass	✅ PASS	30/30 pass, 0 failures, 0 skipped (verified by running `node --test tests/codex-config.test.cjs`)
Commit 2 — Discuss Scouting
9	scout_codebase placement	✅ PASS	After check_existing + load_prior_context, before analyze_phase
10	Codebase maps → grep fallback chain	✅ PASS	Checks .planning/codebase/*.md first, falls back to targeted grep
11	codebase_context structure	✅ PASS	Reusable assets, established patterns, integration points, creative options
12	analyze_phase uses context	✅ PASS	Explicit: "Use both prior_decisions and codebase_context to ground the analysis"
13	Code context annotations	✅ PASS	Examples with existing component references in gray area descriptions
14	Context7 scoping	✅ PASS	Only for library selection questions, not every question
15	context.md template	✅ PASS	`<code_context>` with 3 subsections matches workflow output
16	allowed-tools updated	✅ PASS	Glob, Grep, Task, mcp__context7__* added
Commit 3 — Agent Guards
17	Paralysis guard threshold	✅ PASS	5+ consecutive Read/Grep/Glob without Edit/Write/Bash
18	Paralysis guard escape hatch	✅ PASS	Stop + write code OR report "blocked" with missing info
19	Guard placement	✅ PASS	After deviation_rules, before authentication_gates
20	Exhaustive cross-check scope	✅ PASS	Step 4 reads PROJECT.md + phase goal. Unmapped = automatic blocker
21	Task-level TDD structure	✅ PASS	`tdd="true"` + `<behavior>` block with XML example
22	TDD exception list	✅ PASS	6 exceptions: checkpoint, config-only, docs, migrations, glue, styling
23	TDD vs existing TDD Detection	✅ PASS	Complementary: dedicated plans for heavy logic, task-level for standard plans
24	Exhaustive cross-check false positives	⚠️ CONCERN	"relevant to this phase" is subjective + automatic blocker severity. Could flag tangentially-related PROJECT.md requirements
25	Paralysis guard false trigger	⚠️ LOW	Complex codebases could hit 5 reads legitimately, but "report blocked" escape hatch mitigates

4. Verdict

PASS — All three commits implement their stated objectives correctly. The Codex adapter is comprehensive (8 functions, 30 passing tests, clean install/uninstall paths with idempotency). The discuss-phase scouting adds genuine value by grounding gray areas in actual codebase state. The agent quality guards address real failure modes with well-designed escape hatches.

All 3 commits confirmed present on main via cherry-pick (commits 1455931, 37582f8, aaea14e). The PR branch feat/agent-discuss-codex-enhancements is stale — all changes are already merged.

Noted concerns (non-blocking):

Exhaustive cross-check "relevant to this phase" + automatic blocker could generate false positives from tangentially related PROJECT.md requirements. Consider adding: "A requirement is relevant if the phase goal directly implies it."
Paralysis guard 5-read threshold could trigger on legitimately complex codebases, but the "report blocked" escape hatch prevents forced premature writes.

Upstream: gsd-build/get-shit-done#811 — open, CONFLICTING. All changes already on upstream main. Recommend closing both PRs.

Verification performed using parallel code trace and adversarial review teams. Codex adapter tests independently executed and confirmed 30/30 passing.

- Tibsfox ^.^

Tibsfox · 2026-03-03T08:54:32Z

Closing: all 3 commits confirmed present on main via cherry-picks (1455931, 37582f8, aaea14e). Full Code Logic Verification posted above. - Tibsfox ^.^

glittercowboy and others added 3 commits February 28, 2026 02:23

Tibsfox requested a review from glittercowboy as a code owner February 28, 2026 10:42

Tibsfox mentioned this pull request Mar 3, 2026

feat: add Codex parity, discuss-phase scouting, and agent quality guards Tibsfox/get-shit-done#4

Closed

Tibsfox closed this Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Codex parity, discuss-phase scouting, and agent quality guards#811

feat: add Codex parity, discuss-phase scouting, and agent quality guards#811
Tibsfox wants to merge 3 commits intogsd-build:mainfrom
Tibsfox:feat/agent-discuss-codex-enhancements

Tibsfox commented Feb 28, 2026

Uh oh!

Tibsfox commented Mar 3, 2026

Uh oh!

Tibsfox commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Tibsfox commented Feb 28, 2026

Summary

Motivation

Changes

Commit 1: bf26f95 -- Codex request_user_input, multi-agent config, agent role generation

Commit 2: 0dc8120 -- Code-aware discuss phase with codebase scouting (#727)

Commit 3: 9124906 -- Analysis paralysis guard, exhaustive cross-check, task-level TDD (#736)

Relationship to Other PRs

Testing

New Tests (codex-config.test.cjs)

Full Suite Results

Impact

Uh oh!

Tibsfox commented Mar 3, 2026

Code Logic Verification of PR #4

1. The Before State (Bug Paths)

2. The After State (Fix Paths)

3. Key Correctness Checks

4. Verdict

Uh oh!

Tibsfox commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Commit 1: `bf26f95` -- Codex request_user_input, multi-agent config, agent role generation

Commit 2: `0dc8120` -- Code-aware discuss phase with codebase scouting (#727)

Commit 3: `9124906` -- Analysis paralysis guard, exhaustive cross-check, task-level TDD (#736)